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Abstract 

We present an empirical analysis of targeted attacks 
against a human-rights Non-Governmental Organization 
(NGO) representing a minority living in China. In par- 
ticular, we analyze the social engineering techniques, at- 
tack vectors, and malware employed in malicious emails 
received by two members of the NGO over a four-year 
period. We find that both the language and topic of 
the emails were highly tailored to the victims, and that 
sender impersonation was commonly used to lure them 
into opening malicious attachments. We also show that 
the majority of attacks employed malicious documents 
with recent but disclosed vulnerabilities that tend to 
evade common defenses. Finally, we find that the NGO 
received malware from different families and that over a 
quarter of the malware can be Unked to entities that have 
been reported to engage in targeted attacks against polit- 
ical and industrial organizations, and Tibetan NGOs. 

1 Introduction 

In the last few years, a new class of cyber attacks has 
emerged that is more targeted at individuals and organi- 
zations. Unlike their opportunistic, large-scale counter- 
parts, targeted attacks aim to compromise a handful of 
specific, high-value victims. These attacks have received 
substantial media attention, and have successfully com- 
promised a wide range of targets including critical na- 
tional infrastructures [19], Fortune 500 companies [23], 
news agencies [20], and political dissidents [10, 1 1, 16]. 

Despite the high stakes involved in these attacks, the 
ecosystem sustaining them remains poorly understood. 
The main reason for this lack of understanding is that vic- 
tims rarely share the details of a high-profile compromise 
with the public, and they typically do not disclose what 
sensitive information has been lost to the attackers. Ac- 
cording to folk wisdom, attackers carrying out targeted 
attacks are generally thought to be stale-sponsored. Ex- 
amples of national organizations that have been reported 
to be engaged in targeted attacks include the NSA's of- 



fice of Tailored Access Operations (TAO) [3] and the 
People's Liberation Army's Unit 61398 [15]. Recently, 
researchers also attributed attacks in the Middle East to 
the governments of Bahrain, Syria, and the United Arab 
Emirates [16]. 

There now exists public evidence that virtually every 
computer system connected to the internet is susceptible 
to targeted attacks. The Stuxnet attack even successfully 
compromised air-gapped Iranian power plants [19] and 
was able to damage the centrifuges in the faciUty. More 
recently, Google, Facebook, the New York Times, and 
many other global companies have been compromised 
by targeted attacks. Furthermore, political dissidents and 
Non-Governmental Organizations (NGOs) are also being 
targeted [10, 11, 16]. 

In this paper, we analyze 1,493 suspicious emails col- 
lected over a four-year period by two members of the 
World Uyghur Congress (WUC), an NGO representing 
an ethnic group of over ten million individuals mainly 
living in China. WUC volunteers who suspected that 
they were being specifically targeted by malware shared 
the suspicious emails that they received with us for anal- 
ysis. We find that these emails contain 1,176 maUcious 
attachments and target 724 unique email addresses be- 
longing to individuals affiliated with 108 different orga- 
nizations. This result indicates that, despite their targeted 
content, these attacks were sent to several related victims 
(e.g., via Cc). Although the majority of these targeted or- 
ganizations were NGOs, they also comprised a few high- 
profile targets such as the New York Times and US em- 
bassies. 

We leverage this dataset to perform an empirical anal- 
ysis of targeted attacks in the wild. First, we analyze 
the engineering techniques and find that the language 
and topic of the malicious emails were tailored to the 
mother tongue and level of specialization of the victims. 
We also find that sender impersonation was common and 
that some attacks in our dataset originated from com- 
promised email accounts belonging to high-profile ac- 
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tivists. Second, whereas recent studies report that ma- 
licious archives and executables represented the majority 
of the targeted-attack threat [15, 22], we find that mali- 
cious documents were the most common attack vector in 
our dataset. Although we do not find evidence of zero- 
day vulnerabilities, we observe that most attacks used re- 
cent vulnerabilities, that exploits were quickly replaced 
to adapt to new defense mechanisms, and that they of- 
ten bypassed common defenses. Third, we perform an 
analysis of the first-stage malware delivered over these 
malicious emails and find that WUC has been targeted 
with different families of malware over the last year. We 
find that over a quarter of these malware samples exhib- 
ited similarities with those used by entities reported to 
have carried out targeted attacks. 

Our work complements existing reports on targeted at- 
tacks such as GhostNet, Mandiant, and Symantec Inter- 
net Security Threat (ISTR) 2013 [11, 15,22]. Whereas 
the GhostNet and Mandiant reports focus on the attack 
Ufecycle after the initial compromise, this study provides 
an in-depth analysis of the reconnaissance performed be- 
fore the compromise. We note that both approaches have 
pros and cons and are complementary: While it is hard 
for the authors of these reports to know how a system be- 
came compromised in retrospect, it is equally hard for us 
to know if the observed attacks will compromise the tar- 
geted system(s). Finally, whereas ISTR provides some 
numbers about reconnaissance analysis for industrial- 
espionage attacks [22], we present a thorough and rig- 
orous analysis of the attacks in our dataset. 

Finally, to foster research in this area, we release our 
dataset of targeted malware to the community [4] . 
Scope. Measuring real-world targeted attacks is chal- 
lenging and this paper has a number of important bi- 
ases. First, our dataset contains mainly attacks against 
the Uyghur and human-rights communities. While the 
specifics of the social engineering techniques (e.g., use 
of Uyghur language) will vary from one targeted com- 
munity to another, we argue that identifying commonly 
used techniques (e.g., topic, language, senders' imper- 
sonation) and their purpose is a necessary step towards 
designing effective defenses. Another limitation of our 
dataset is that it captures only targeted attacks carried out 
over email channels and that were detected by our vol- 
unteers. Although malicious emails seem to constitute 
the majority of targeted attacks, different attack vectors 
such as targeted drive-by downloads are equally impor- 
tant. Finally, we reiterate that the goal of this study is to 
imderstand the reconnaissance phase occurring before a 
compromise. Analyzing second-stage malware, monitor- 
ing compromised systems, and determining the purpose 
of targeted attacks are all outside of the scope of this pa- 
per and are the topic of recent related work [10, 16]. We 
discuss open research challenges in Section 6. 



F rom ; ... 

Date: Hon, Fldr 4, 2013 at 8:58 AH 

Subject: Invitation Letter of WUC International Conference 
To : ... 

Dear .... 

1 am writing to you from the World Uyghur Congress (WUC) and on behalf 
of the Unrepresented Nations and Peoples Organization (UNPO) and the 
Society for Threatened Peoples (STP) with financial support from the 
National Endowment for Democracy, cordially invites you to attend the 
WUC ' 5 upcoming Conference which will be held in Geneva between 11th 
andl3th Harch 2013. 

Attached you can find the invitation letter. We hope you will give a 
positive consideration to this invitation, and look forward to meeting 
you in Geneva. During your stay in Geneva, travel, accommodation and 
food are covered by the WUC. 

The WUC is a nonprofit organization granted by the National Endowment 
for Democracy in Washington, DC to peacefully promote human rights, 
democracy and freedom for the Uyghur people in East Turkestan. 

If you have any guestions or gueries regarding your participation, 
please do not hesitate to contact me. Phone: Fax: e-mail: 

sincerely. 

Figure 1: Screenshot of a malicious email with an im- 
personated sender, and a malicious document exploit- 
ing Common VulnerabiUties and Exposures (CVE) num- 
ber 2012-0158 and containing malware. The email re- 
plays an actual announcement about a conference in 
Geneva and was edited by the attacker to add that all 
fees would be covered. 

2 Overview 

Context. WUC, the NGO from which we have received 
our dataset, represents the Uyghurs, an ethnic minority 
concentrated in the Xinjiang region in China. Xinjiang 
is the largest Chinese administrative division, has abun- 
dant natural resources such as oil, and is China's largest 
natural gas-producing region. WUC frequently engages 
in advocacy and meeting with politicians and diplomats 
at the EU and UN, as well as collaborating with a variety 
of NGOs. Rebiya Kadeer, WUC's current president, was 
the fifth richest person in China before her imprisonment 
for dissent in 1996, and is now in exile in the US. Fi- 
nally, WUC is partly funded by the National Endowment 
for Democracy (NED), a US NGO itself funded by the 
US Congress to promote democracy. (We will see below 
that NED has been targeted with the same malware as 
WUC.) 

WUC has been a regular target of Distributed De- 
nial of Service (DDoS) attacks and telephone disrup- 
tions, as well as targeted attacks. For example, the 
WUC's website became inaccessible from June 28 to 
July 10, 2011 due to such a DDoS attack. Concurrently 
to this attack, the professional and private phone lines of 
WUC employees were flooded with incoming calls, and 
the WUC's contact email address received 15,000 spam 
emails in one week. 

Data acquisition. In addition to these intermittent 
threats, WUC employees constantly receive suspicious 
emails impersonating their colleagues and containing 
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malicious links and attachments. These emails consis- 
tently evade spam and malware defenses deployed by 
webmail providers and are often relevant to WUC's ac- 
tivities. In fact, our volunteers claim that the emails are 
often so targeted that they need to confirm their legiti- 
macy with the impersonated sender in person. For ex- 
ample, Figure 1 shows the screenshot of such an email 
that replays the actual announcement for a conference in 
Geneva organized by WUC. As a result, WUC members 
are wary of any emails containing links or attachments, 
and some of them save these emails for future inspec- 
tion. We came in contact with two WUC employees who 
shared the suspicious emails that they had received (with 
consent from WUC). The authors of this work were not 
involved in the data collection. 

Characteristics of the dataset. The two volunteers 
shared with us the headers and content of 1,493 suspi- 
cious emails that they received over a four-year period. 
1,178 (79%) of these emails were sent to the private 
email addresses of the two NGO employees from whom 
we obtained the data, 16 via the public email address of 
the WUC, and the remaining 299 emails were forwarded 
to them (126 of these by colleagues at WUC). Overall, 
89% of these emails were received directly by our volun- 
teers or their colleagues at WUC. As we will see below, 
they also contain numerous email addresses in the To and 
Cc fields belonging to individuals that are not affiliated 
with WUC. 

The emails contained 209 links and 1,649 attachments, 
including 1,176 with malware (247 RAR, 49 ZIP, 144 
PDF, and 655 Microsoft Office files, and 8 1 files in other 
formats). Our analysis revealed 1,116 malicious emails 
containing malware attachments. (We were not able to 
verify the maliciousness of the links as most of them 
were invalid by the time we obtained the data.) In the fol- 
lowing, we analyze malicious emails exclusively and we 
refer to malicious archives or documents depending on 
whether they contained RAR or ZIP, PDF or Microsoft 
Office documents, respectively. Finally, the volunteers 
labeled the data wherever necessary, enabling us, for ex- 
ample, to establish that the sender of the emails was im- 
personated for 84% of the emails. Table 1 summarizes 
the main characteristics of these malicious emails. 
Scope of the dataset. Analyzing the headers of the ma- 
licious emails revealed a surprisingly large number of re- 
cipients in the To or Cc fields. In particular, we observed 
that malicious emails had been sent to 1,250 unique 
email addresses and 157 organizations. A potential ex- 
planation for this behavior could be that the attacker tam- 
pered with the email headers (e.g., via a compromised 
SMTP server) as part of social engineering so these 
emails were only delivered to our volunteers, despite 
the additional indicated recipients. To test this hypoth- 
esis, we considered only those emails received directly 



by our volunteers, originating from well-known webmail 
domains (i.e., aol.com, gmx.de, gmx.com, gmail.com, 
googlemail.com, hotmail.com, outlook.com, and ya- 
hoo. com), and verified via Sender Policy Framework 
(SPF) and DomainKeys Identified Mail (DKIM). SPF 
and DKIM are methods coiimionly used to authenticate 
the sending server of an email message. By verifying 
that these mahcious emails originated from well-known 
webmail servers, we obtain 568 malicious emails whose 
headers are very unlikely to have been tampered with by 
the attacker. By repeating our above analysis on these 
emails only, we obtain 724 unique email addresses and 
108 organizations. Other organizations besides WUC 
include NED (WUC's main source of funding and it- 
self funded by the US congress), the New York Times, 
and US embassies. In summary, while we obtained our 
dataset from two volunteers working for a single orga- 
nization, it offers substantial coverage not only of one 
NGO, but also of those attacks against multiple NGOs in 
which attackers target more than one organization with 
the same email. We show the full list of organizations 
targeted in our dataset in Appendix A. 

What are targeted attacks? There is no precise defini- 
tion of targeted attacks. In this paper, we loosely define 
these attacks as low-volume, socially engineered com- 
munication which entices specific victims into installing 
malware. In the dataset we analyze here, the communi- 
cation is by email, and the mechanism of exploitation is 
primarily using malicious archives or documents. A tar- 
geted victim, in this work, refers to specific individuals, 
or an organization as a whole. When necessary, we also 
use the term volunteer(s) to distinguish between our two 
collaborators and other victims. 

The terms targeted attacks and Advanced Persistent 
Threats (or APTs) are often used interchangeably. As 
this paper focuses on the reconnaissance phase of tar- 
geted attacks (occurring before a compromise), we can- 
not measure how long attackers would have remained in 
control of the targeted systems (i.e., their persistency). 
As a result, we simply refer to these attacks as targeted 
attacks, and not APTs, throughout the rest of this pa- 
per. We discuss specific social engineering characteris- 
tics that make targeted attacks difficult to detect by un- 
suspecting average users in Section 3, the attack vectors 
used in our dataset in Section 4, and the malware fam- 
ilies they install in Section 5. Finally, we will discuss 
open research challenges in Section 6. 

Ethics. The dataset was collected prior to our contact- 
ing WUC and for the purpose of future security analysis. 
Furthermore, WUC approved the disclosure of all the in- 
formation contained in this paper and requested that the 
organization's name not be anonymized. 
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Table 1 : Summary of our dataset originating from two volunteers. Malicious indicates the fraction of emails containing 
malware, Impersonated the fraction of emails with an impersonated sender, # recipients and # orgs the number of 
unique email addresses that were Usted in the To and Co fields of the maUcious emails and the corresponding number 
of organizations, respectively. 





Beginning - end 


Size 


Malicious 


Impersonated 


# recipients 


# orgs 


1st volunteer 


Sept 2012 - Sept 2013 


98 MB 


154/241 (64%) 


141/154 (92%) 


124 


25 


2nd volunteer 


Sept 2009 - Jul 2013 


818 MB 


962/1,252 (77%) 


802/962 (83%) 


666 


102 



Total Sept 2009 - Sept 2013 916 MB 1,116/1,493(75%) 943/1,116(84%) 724 108 



3 Analysis of social engineering 

The GhostNet, Mandiant, ISTR, and other reports [11, 
15, 22] mention the use of socially-engineered emails to 
lure their victims into instalUng malware, clicking on 
malicious links, or opening malicious documents. For 
example, the GhostNet report refers to one spoofed email 
containing a maUcious DOC attachment, and the Man- 
diant report to one email sent from a webmail account 
bearing the name of the company's CEO enticing several 
employees to open malware contained in a ZIP archive. 
Concurrent work reports the use of careful social engi- 
neering against civilians and NGOs in the Middle East 
[16] and also Tibetan and human-rights NGOs [10]. De- 
spite this anecdotal evidence, we are not aware of any 
rigorous and thorough analysis of the social engineering 
techniques employed in targeted attacks. In this section, 
we seek to answer the following questions in the context 
of our dataset: 

• What social traits of victims are generally ex- 
ploited? Do attackers generally impersonate a 
sender known to the victim and if so who do they 
choose to impersonate? 

• Who are the victims? Are mahcious emails sent 
only to specific individuals, to entire organizations, 
or communities of users? 

• When are users being targeted? When do users 
start being targeted? Are the same users frequently 
being targeted and for how long? Are several 
users from the same organization being targeted 
simultaneously? 



3.1 Methodology 

The analysis below focuses on 1,116 malicious emails 

received between 2009 and 2013. 

Topics and language. To attempt to understand how 

well the attacker knows his victims, we manually catego- 
rized the emails (coded) by topic and language. (Unless 



indicated otherwise, the analysis below was performed 
on emails that were coded by one of the author.) The 
topic was determined by reading the emails' titles and 
bodies and, in cases where emails were not written in En- 
gUsh, we also used an online translation service. Emails 
whose topic was still unclear after using the translator 
were labeled as Unknown. 

Targeted victims. To determine the targeted victims 
of these attacks, we searched the email addresses and 
fuU names of the senders and receivers for the maU- 
cious emails originating from trustworthy SMTP servers. 
When available, we used their public profiles available 
on social media websites such as Google, Facebook, and 
Skype to determine their professional positions and or- 
ganizations. We assume we have found the social profile 
of a victim if one of the three following rules appUes (in 
that order): First, if the social profile refers directly to 
the email address seen in the malicious email; second, 
if the social profile refers to an organization whose do- 
main matches the victims' email address; or third, if we 
find contextual evidence that the social profile is linked 
to WUC, Uyghurs, or the topic of the malicious email. 
Out of 724 victims' email addresses, we found the pro- 
file of 32% (237), 4% (30), and 23% (167) using the first, 
second, and last rule, respectively. 
Organizations and industries. In the following, WUC 
refers to victims directly affiliated with the organiza- 
tion (including our volunteers). Other Uyghur NGOs 
include Australia, Belgium, Canada, Finland, France, 
Japan, Netherlands, Norway, Sweden, and UK associa- 
tions. Other NGOs include non-profit organizations such 
as Amnesty International, Reporters Without Borders, 
and Tibetan NGOs. Academia, Politics, and Business 
contain victims working in these industries. Finally, Un- 
known corresponds to victims for which we were not able 
to determine an affiliation. 

Ranks. We also translated the professional positions 
of the victims into one of the three categories: High, 
Medium, and Low profile. We consider professional lead- 
ership positions such as chairpersons, presidents, and ex- 
ecutives as high-profile, job positions such as assistants, 
and IT personnel as medium-profile, and unknown and 
shared email addresses (e.g., NGO's contact informa- 
tion) as low-profile. 
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Topics of malicious emails 
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Figure 2: Distribution of the topics of the malicious 
emails for each year of the dataset shared by our two 
volunteers. The left bar corresponds to the data shared 
by both volunteers, and the next two bar groups to each 
year of the data shared by our first and second volun- 
teer, respectively. The content of malicious emails is 
targeted to the victims. 

Languages of malicious emails 
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Figure 3: Distribution of languages for each year of our 
dataset. Malicious emails employ the language of their 
victims. 



Impersonation. Finally, to understand the social con- 
text of the attack, each of our volunteers coded (based 
on her experience within the organization) all the email 
addresses of the senders into one of five categories: 
Spoofed, Typo, Name, Suspicious, or Unknown. (Coding 
was done based exclusively on the personal knowledge 
of the volunteers.) An email is marked as Spoofed if it 
bears the exact sender email address of a person known 
to our volunteers, as Typo if it resembles a sender email 
address known to the receiver but is not identical, and as 
Name if the attacker used the full name of a volunteer's 
contact (with a different email address). Finally, email 
addresses that look as if they had been generated by 
a computer program (e.g., uiow839djs93j@yahoo.com) 
are labeled as Suspicious and aU remaining emails as Un- 



known. Our assumption is that, because our volunteers 
received most of the malicious emails directly, they were 
likely to recognize cases where their contacts were be- 
ing impersonated. We note that labeling is conservative: 
Our volunteers may sometimes label Spoofed or Typo ad- 
dresses as Unknown because they do not know the person 
impersonated in the attack. This may happen, for exam- 
ple, in cases where they were not the primary target of 
the attack (e.g., they appeared in Cc). 
Limitations. Our dataset originates from WUC and is 
limited to those victims that were targeted together with 
that organization. We will see that these victims were of- 
ten NGOs. As a result, the social engineering techniques 
observed here may differ from attacks against different 
entities such as companies, political institutions, or even 
other NGOs. Despite these limitations, we argue that this 
analysis is an important first step towards understanding 
the human factors exploited by targeted attacks. 

3.2 Results 

In this subsection, we discuss the results of our analy- 
sis of the social engineering techniques used in the mali- 
cious emails. 

Topics and language. The topic of malicious emails in 
our dataset can generally be classified into one of three 
categories: WUC, Uyghur, and human-rights. In partic- 
ular, we observed 51% (575) of malicious emails per- 
taining to WUC, 29% (326) to Uyghurs, 12% (139) to 
human-rights, and 3% (28) to other topics. In addition, 
the native language of the victim is often used in the ma- 
licious emails. In fact, 69% (664) of the emails sent to the 
second volunteer were written in the Uyghur language, 
and 62% (96) for the first one. These results indicate that 
attackers invested significant effort to tailor the content 
of the malicious emails to their victims, as we see in Fig- 
ure 2 and Figure 3. 

Specialized events. In addition to being on topic, we 
also observed that emails often referred to specific events 
that would only be of interest to the targeted victims. 
Throughout our dataset, we found 46% of events (491) 
related to organizational events (e.g., conferences). We 
note that these references are generally much more spe- 
cialized than those used in typical phishing and other 
profit-motivated attacks. For example. Figure 1 shows a 
screenshot of an attack that replayed the announcement 
of a conference on a very specialized topic. The mali- 
cious email was edited by the attacker to add that all fees 
would be covered (probably to raise the target's interest). 
Impersonation. We find that attackers used carefully 
crafted email addresses to impersonate high-profile iden- 
tities that the victims may directly know. That is, attack- 
ers used one of the following four techniques to add le- 
gitimacy to a mahcious email: First, 41% (465) of the 
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Impersonation techniques 
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Figure 4: Distribution of senders' impersonation tech- 
niques for each year of our dataset. Malicious emails 
spoof the email address of a contact of the volunteers, 
use a very similar address controlled by the attacker, 
or a contact's full name. 
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Figure 5: Distribution of impersonated senders' ranks for 
each year of our dataset. Malicious emails often imper- 
sonate high-profile individuals. 

Correlation between topics and languages 



email addresses have Typos (i.e., the email address re- 
sembles known sender addresses, but with minor, sub- 
tle differences). These email addresses are identical to 
legitimate ones with the exception of a few characters 
being swapped, replaced, or added in the usemame. Sec- 
ond, 12% (134) of the senders' full names corresponded 
to existing contacts of the volunteers. Third, we find 
that most email addresses belonged to well-known email 
providers — Google being the most prominent with 58% 
of all emails using the Gmail or GoogleMail domains, 
followed by Yahoo with 16%. 

Fourth, we find that 30% (337) of the sender emails 
were spoofed (i.e., the email was sent from the address of 
a person that the volunteer knows). This observation sug- 
gests that the attacker had knowledge of the victim's so- 
cial context, and had either spoofed the email header, or 
compromised the corresponding email account. To iden- 
tify a subset of compromised email accounts, we con- 
sider spoofed emails authenticated by the senders' do- 
mains using both SPF and DKIM. To reduce the chances 
of capturing compromised servers instead of compro- 
mised accounts, we also consider only well-known, trust- 
worthy domains such as GMail. This procedure yields 
malicious emails that were likely sent from the legitimate 
account of the victims' contacts. We found that three 
email accounts belonging to prominent activists, includ- 
ing two out of 10 of the WUC leaders, were compro- 
mised and being used to send malicious emails. We have 
alerted these users and are currently working with them 
to deploy defenses and more comprehensive monitoring 
techniques, as we will discuss in Section 6. 

We show the distributions of malicious emails sent 
with spoofed, typo, suspicious, or unknown email ad- 
dresses in Figure 4, and the ranks of the impersonated 
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Figure 6: Distribution of languages employed to write 
about the main topics of malicious emails. There is a 
strong correlation between maUcious emails' topics 
and the language in which they are written. 



senders in Figure 5. (We do not show the correspond- 
ing ranks for receivers because NGOs generally function 
with a handful of employees, all playing a key role in the 
organization.) 

Targeted victims. For the analysis below, which lever- 
ages other recipients besides our two volunteers, we fur- 
ther filter emails to keep only those originating from 
well-known domains (as described in Section 2). Doing 
this leaves us with 568 malicious emails that are likely 
to have indeed been sent to all the email addresses in the 
header We find that the attacks target more organiza- 
tions than WUC, including 38 Uyghur NGOs, 28 Other 
NGOs, as well as 41 Journalistic, Academic, and Polit- 
ical organizations. (See Appendix A for the complete 
list of targeted organizations.) Interestingly, we find a 
strong correlation between the topic of an email and the 
language in which the email was written, as we show in 
Figure 6. Our results show that English was more and 
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Figure 7: Timeline of attacks, in number of malicious 
emails per month, against the 60 most targeted victims 
(our two volunteers' rows are shaded and the vertical 
line corresponds to one of our volunteer joining WUC). 
The Y axis represents victims grouped by organization. 
ETUE corresponds to the East Turkestan Union in Eu- 
rope NGO and Others to different organizations. Each of 
the top 60 victims has been frequently attacked over 
the last four years and several victims from the same 
NGOs were attacked simultaneously. 



more common as the topic became less and less special- 
ized. We hypothesize that attackers may have sent the 
same email messages to several recipients with similar 
interests to reduce the costs involved in manually craft- 
ing these emails. 

Timing. Our dataset shows that the same victims were 
frequently targeted and that several members of the 
same organization were routinely targeted simultane- 
ously. This suggests that attackers were using a "spray" 
strategy, trying to find the weakest links in the targeted 
organization, and hence, optimizing their chance of suc- 
cess. Spraying is clearly visible in Figure 7 where we 
see that the top 60 most targeted victims in our dataset 
received malicious emails often over the last four years. 
(We note that the dataset shared by one of our volun- 
teers starts on August 2012, explaining why we observe 
more malicious emails after that date.) We also see that, 
3 1 email accounts from individuals without affiliation to 
WUC were often targeted simultaneously to the WUC 
accounts. 



Summary of Findings. We now revisit the initial ques- 
tions posed at the beginning of this section. First, we 
saw that most emails in our dataset pertained to WUC, 
Uyghurs, or human-rights, were written in the recipi- 
ent's mother tongue, and often referred to very special- 
ized events. We also found that sender impersonation 
was common and that some email accounts belonging to 
WUC's leadership were compromised and used to spread 
targeted attacks. (We note that many more accounts may 
be compromised but remain dormant or do not appear 
as compromised in our dataset.) Second, we showed 
that numerous NGOs were being targeted simultaneously 
with WUC and that the specialization of emails var- 
ied depending on the recipient(s). Finally, we observed 
that the most targeted victims received several malicious 
emails every month and that attacks were sprayed over 
several organizations' employees. 

4 Analysis of attack vectors 

We now analyze the techniques used to execute arbitrary 
code on the victim's computer The related work re- 
ports the use of malicious links, email attachments, and 
IP tracking services [10, 16]. Whereas ISTR 2013 re- 
ports that EXE are largely used in targeted attacks, and 
the Mandiant report that ZIP is the predominant format 
that they have observed in the last several years, we find 
that these formats represent 0% and 4% (49) of malicious 
attachments in our dataset, respectively. Instead, we find 
RAR archives and malicious documents to be the most 
common attack vectors. Hypotheses that may explain 
these discrepancies with the Mandiant report include the 
tuning of attack vectors to adapt to the defenses mecha- 
nisms used by different populations of email users (e.g., 
NGOs vs. corporations); Mandiant's attacker (APTl), 
mainly using primitive attack vectors such as archives; 
and/or Mandiant having excluded more advanced attack 
vectors, such as documents, from its report. However, in 
the absence of empirical data on APTl's attack vectors, 
we cannot test these hypotheses. In this section, we per- 
form a quantitative study of the attack vectors employed 
in our dataset, and also analyze their dynamics. We seek 
to answer the following questions: 

• What attack vectors are being employed against 
WUC? Do they generally rely only on human fail- 
ures or also on software vulnerabilities? Do they 
evolve in time and if so, how quickly do they adapt 
to new defense mechanisms? 

• What is the efficacy of existing countermeasures? 
As all malicious documents in our dataset used 
well-known vulnerabilities, would commercial, 
state-of-the-art defenses have detected all of them? 
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Targeted vulnerabilities 



Targeted applications 




Figure 8: Number of malicious documents containing a given vulnerability (CVE) (left) and target application (right) 
for each month of our dataset. We represent the top four CVEs in number of attacks over the whole trace individu- 
ally and others are represented in aggregate. The vertical line in November 2010 corresponds to the deployment of 
sandboxing in Acrobat Reader. Although Acrobat Reader was the most targeted appUcation until 2010, recent 
attacks mainly target the Office suite. 



4.1 Methodology 

Malicious archives. To analyze the archives' contents, 
we extracted them in a disconnected VM environment 
and manually inspected their contents to determine the 
type of files they contain, independently of their exten- 
sions. In the case of EXE files, we also examined them 
manually to determine whether their Microsoft Windows 
icons were similar to those used for other file formats 
(e.g., JPEG) in order to persuade average users that they 
were not executable. 

Malicious documents. We used two methodologies to 
determine the characteristics of the vulnerabilities being 
exploited by malicious documents. First, we submitted 
the documents to VirusTotal [1] for analysis. Each of the 
45 Antivirus (AVs) on VirusTotal classified the checked 
sample as benign or malicious, and attached a "tag" de- 
scribing the auxiliary information relating to the sample. 
Often the tag is a Common Vulnerabilities and Exposures 
(CVE) number, presumably corresponding to the signa- 
ture that matched, but in some cases, the tag field is not 
a CVE; it is either tagged as "unknown" or contains a 
symptomatic description such as the inclusion of a sus- 
picious OLE object. We refer to these three tags as CVE, 
Unknown, and Heuristic, respectively. Often all AVs re- 
ported a Single CVE but sometimes, they reported Mul- 
tiple, conflicting CVEs. Once we collected all CVE tags, 
we then scraped the National Vulnerability Database [18] 
to obtain the release date and vulnerable applications for 
each of the CVEs that we found. 

Second, we inspected the documents manually to con- 
firm that they contain malware, and also used taint- 
assisted analysis both to verify the accuracy of the CVEs 
reported in AV reports and to investigate the presence of 
zero-day vulnerabilities.' The methodological details of 



our taint-assisted manual analysis are described in Ap- 
pendix B. 

Defenses. We performed a retrospective analysis of the 
protection offered by common defenses such as AV and 
webmail providers in the context of our malicious docu- 
ments. For AV, we used VirusTotal to determine whether 
a malicious document is detected by the scanning engine 
of each AV, as described above. For webmail channels, 
we created an email account on GMail, Hotmail, and Ya- 
hoo, and used a dedicated SMTP server to send emails 
to that account with malicious documents attached. We 
considered malicious documents delivered without mod- 
ifications as undetected by the webmail defenses. Oth- 
erwise, if an email or its attachment is dropped, or if the 
attachment's payload is modified, we considered it as de- 
tected. The analyses based on webmails and VirusTotal 
were performed in November 2013 and July 2014, re- 
spectively. 

Limitations. As with social engineering, our analysis 
of attack vectors is biased towards NGOs. In addition, 
the above methodology is limited to the attack vectors 
captured in our dataset. For example, we miss attacks 
against the NGOs' web servers unless the corresponding 
malicious link appears in the suspicious emails. 

Second, our taint-assisted analysis of vulnerabilities is 
limited to those documents for which we were able to 
analyze the logs manually. For example, we found that 
opening PDF files in our environment generated log files 
that were far too large (around 15GB in the median case) 
for manual analysis. As a result, we were able to man- 
ually confirm vulnerabilities only against Microsoft Of- 
fice. However, despite this limitation, we were also able 
to determine which PDF documents contained malware 
through manual inspection. 



Hereafter, zero-day vulnerabilities refer to vulnerabilities that were not publicly disclosed at the time of the attack. 
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Table 2: List of well-known vulnerabilities exploited by 
malicious documents. Release corresponds to the release 
date of the vulnerability and First to its first exploitation 
in our data set (in number of days relative to the release 
date). Resolved corresponds to the number of Microsoft 
Office vulnerabilities that were mistagged in AV reports 
but that we were able to resolve using taint-assisted man- 
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Finally, our defense analysis was performed in bulk, 
after the time of the attacks. As a result of the difference 
between the times of attack and analysis (up to four years 
for the first malicious documents), the detection rates re- 
ported hereafter should be treated as upper bounds. This 
is because the AV signatures at the time of the analysis 
were more up-to-date than they would have been at the 
time of the attack. 

4.2 Results: Attack vectors 
4.2.1 Malicious arcliives 

We observed numerous targeted attacks leveraging social 
engineering and human failure to install malware on the 
victim's computer. In particular, we found 247 RAR and 
49 ZIP containing malicious EXE. In 10 cases, the ma- 
licious archives were password protected with the pass- 
word included in the email's body. We hypothesize that 
archiving was used as a rudimentary form of packer for 
the malware to evade detection by the distribution chan- 
nels. Finally, we found that 20% of all EXEs contained 
in the archives used an icon that resembled a non-EXE, 
i.e., a DOC, JPEG, or PDF icon, in 20%, 19%, and 7% 
of the cases. 
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Figure 9: Timeline of the target vulnerabilities. The Y 
axis corresponds to CVEs and each circle to the number 
of CVEs seen each month after the public disclosure of 
the vulnerability (day 0). All vulnerabilities were first 
targeted after their public disclosure. 



4.2.2 Malicious documents 

We used taint-assisted analysis to resolve the conflicts 
due to AV mistagging and summarize the CVE informa- 
tion in Table 2. The number of conflicts resolved using 
taint-assisted manual analysis is reported in the last col- 
umn Resolved. Additional taint-analysis results are re- 
ported in Appendix B. 

Zero-day versus unpatched vulnerabilities. We find 
no evidence of the use of zero-day vulnerabilities against 
our dataset, but several uses of disclosed vulnerabilities 
within the same week as their public release date. In 
addition, we see in Figure 9 that vulnerabilities continued 
to be exploited for years after their disclosure, and this 
confirms that unpatched vulnerabilities represent a large 
fraction of attacks in our dataset. To ascertain the CVE 
being exploited in each sample, we used a combination 
of the telemetry data available in CVE tags generated by 
AVs, and a manual analysis to resolve cases where the tag 
was ambiguous. For each sample, we then recover the 
public disclosure date for the vulnerability manually, and 
treat it as the corresponding day-zero. By comparing the 
time of use in our email dataset, we are able to ascertain 
the lifetime of vulnerability exploits. 

We find several instances of exploits that were used 
in pubhcly-reported targeted attacks in our dataset. For 
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Figure 10: Detection rates of popular webmails for the 
malicious documents. The efficacy of webmails to de- 
tect malicious docwnents varies widely. 

instance, vulnerabilities such as CVE-2009-4324, CVE- 
2010-3654, and CVE-2010-2883 have been reported to 
be zero-day vulnerabilities [6]. However, in our dataset, 
these vulnerabilities were used after their disclosure. 
Evolution of target applications. Our data shows a 
sudden switch from Adobe Reader to Microsoft Office 
suite as the primary targeted application as of Novem- 
ber 2010, as seen in Figure 8. We find a correlation be- 
tween the time of this switch and two events: (a) the de- 
ployment of sandboxing defenses in Adobe Reader and 
(b) the disclosure of vulnerabilities in the Office suite. 
The first version of Acrobat Reader to support sandbox- 
ing for Windows (version 10.0) was released on Novem- 
ber 15, 2010. Within the same month, a stack buffer 
overflow against Microsoft Office was released publicly 
(November 2010), reported as CVE-2010-3333. We see 
this CVE being massively exploited in our dataset as of 
January 2011, which is a time lag of two months. We 
observe the use of CVE-2010-3333 being replaced with 
CVE-2012-0158 in January 2013. This evidence sug- 
gests that attackers adapted their targeted vectors to use 
newly disclosed vulnerabilities within a few days to a few 
months of disclosure, and that updates to the security de- 
sign of software reduces its exploitability in the wild (as 
one would expect). 

4.3 Results: Bypassing common defenses 

We now investigate the efficacy of existing defenses 

against malicious documents. 

Email / Webmail Filtering. Despite the retrospective 
analysis of the malicious documents, we find that the 
detection rates of malicious documents for GMail, Hot- 
mail, and Yahoo were still relatively low (see Figure 10). 
We also find that GMail failed to detect most malicious 
documents sent after March 2012. In particular, while 
the detection of documents sent before March 2012 was 
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Figure 1 1 : Detection rates of malicious documents for 
each of the top 30 AVs as reported by VirusTotal. No sin- 
gle AY detected all malicious documents despite their 
use of well-known vulnerabilities. 



73%, it is 28% after that date. Interestingly, 71% of the 
true positives for GMail after March 2012 corresponded 
to RTF files with all \r\n character sequences substituted 
with the \ n character While this substitution did not de- 
activate the malware, we observed that it broke the shell- 
codes embedded into these documents as they require the 
document size to remain unchanged to function properly. 
As a result, the malware was never executed. Although 
we cannot verify the purpose of this substitution, we note 
that its appearance coincided with that of the malicious 
RTF files. We conclude this discussion by pointing out 
that Yahoo's low detection rate is interesting as it claims 
to be using Symantec AV for its webmail service [12] — 
which, as we will see below, has a much higher detection 
rate. 

Signature-based AV Scanning. In Figure 11, we show 
the detection rates for the top 30 VirusTotal AVs, sorted 
by decreasing detection rate of the malicious documents. 
There are two main takeaways from this graph. First, 
no single vendor detected all original mahcious docu- 
ments, even though we have seen that they used well- 
known vulnerabilities. For example, Qihoo, the vendor 
with the overall best efficacy, was unable to detect 3% of 
the malicious documents based on scanning. Second, we 
observe large variations among the efficacy of different 
AV vendors. That is, the detection rate dropped by 30% 
from the first to the twentietii AV (CAT QuickHeal) and 
the 15 AVs with the lowest detection rate (not shown) all 
had a detection rate of less than 35%. 
Summary of Findings. We found that malicious docu- 
ments are the most popular attack vectors in our dataset 
followed by malicious archives. Malicious documents 
tended to use newly released vulnerabilities, often within 
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Table 3: Summary of the malware clusters. For each cluster, we show the malware family (or an ID if we could 
not determine it), the number of malicious emails containing the malware, the number of Command and Control 
(C2) servers, the similarities in terms of communication protocols and C2 with malware attributed to known entities 
(entityi Com, C2 )). Our dataset contains several families of first-stage malware previously seen in targeted attacks 
carried out in the wild. 
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a week, continued to utilize them for several years, and 
most of them used well-known instead of zero-day vul- 
nerabilities. In particular, our taint-assisted manual anal- 
ysis of Office documents did not reveal a single zero- 
day vulnerability in our dataset. This raises the ques- 
tion of whether defense mechanisms deployed in web- 
mails and state-of-the-art commercial defenses are effec- 
tive in blocking these well-known attacks. Furthermore, 
we found that malicious archives often contained EXE 
files that masquerade as pictures or documents. 

5 Malware analysis 

We now analyze the first-stage malware found in ma- 
licious documents. Unlike the Mandiant report, which 
provides an analysis for malware that targets different 
organizations and that (they claim) originates from the 
same group, our analysis focuses on all malware (in our 
dataset) that has targeted a single organization. By look- 
ing at targeted attacks from the perspective of the target 
rather than the attacker, our analysis enables us to de- 
termine whether WUC has been targeted with the same 
or different malware over the years. We also take a dif- 
ferent approach from the authors of the GhostNet report 
who performed malware analysis on a few compromised 
systems belonging to different but related organizations. 
We instead analyze over six hundred malware samples 
used to establish a foothold on the targeted systems of a 
single organization. Our analysis differs from the related 
work in its scale and context [16] or focus [10]. This 
section aims to answer the following question: 

• Is WUC targeted with the same or different mal- 
ware? In the latter case, are there similarities be- 
tween this first-stage malware and others found in 
targeted attacks in the wild? 

5.1 Methodology 

Our analysis below was done on 689 malware samples 
that we extracted from malicious documents. 
Clustering. To make our analysis tractable for 689 
malware samples, we started by clustering the malware 



based on its behavior. To do so, we ran the malicious 
EXE and DLL files in a disconnected sandboxed envi- 
ronment and hooked the function calls to resolve domain 
names and establish network communications. In ad- 
dition, to obtain the TCP port number on which com- 
munication is done, we intercepted function calls to 
gethostbyname and returned a dummy routable IP ad- 
dress. As a result, the malware subsequently reveals the 
port number when it initiates a connection with the re- 
turned IP. (See Appendix C for the complete list of Com- 
mand and Control (C2) domains.) Finally, we generated 
behavioral profiles for 586 samples, clustered them using 
an approach similar to [5, 14], and manually verified the 
accuracy of the resulting clusters. 
Malware family and similarities. Similarly to Bailey 
et al. [5], we found that determining the malware fam- 
ily using AV signature scanning was unproductive. To 
determine whether our malware shares similarities with 
other known targeted malware, we relied on several re- 
ports on targeted attacks [9, 13]. We extracted the C2 do- 
mains and, when available, additional information about 
the malware (e.g., hashes and behavior) from these re- 
ports. Finally, we correlated the domains, IP addresses, 
hashes, and behavioral profiles with those from the re- 
ports in order to find similarities between the different 
sets of malware. We performed this analysis in February 
2014. 

Limitations. Our behavioral analysis was performed in a 
disconnected environment and as a result, it is limited to 
the first stage of the malware behavior. Studying the be- 
havior of additional payload that would be downloaded 
after the compromise is beyond the scope of this paper 
and will be the subject of future work. 

5.2 Results 

We now analyze the malware clusters and their similari- 
ties with other targeted malware found in the wild. 
Cluster sizes. We find that 57% of our malware be- 
longed to the ten largest clusters (we show additional 
information about these clusters in Table 3). In total, 
five clusters (two in the top ten) used at least one of 
dtl6.mooo.com, dtl.dnsd.me, or dtl.eatuo.com as their 
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C2 domains, indicating some operational link between 
them. In fact, at the time of analysis, these three do- 
mains resolved into the same IP address and the mal- 
ware in each cluster connected to different ports of the 
same server. Despite these apparent similarities, how- 
ever, manual analysis of the behavioral logs revealed that 
their logic differed from one another, explaining their as- 
signment to different clusters. Combined, these five clus- 
ters represented 24% of the malware that we analyzed. 

Malware family and similarities. We found various de- 
grees of similarities between our clusters and targeted at- 
tacks reported in the wild. First, the five clusters above 
had the same C2 as the DTL group reported by FireEye 
in November 2013 and that the malware was of the same 
family as one of these clusters' {APT.9002, not shown) 
[9]. In particular, we found that one of our samples in 
that cluster had the same MD5 hash as those described 
in the FireEye report and that eight had identical mani- 
fest resources. FireEye claims that this malware has been 
used in targeted attacks against various governmental and 
industrial organizations. 

Second, malware in the Surtr cluster had the same be- 
havioral profile as samples used against the Tibetan com- 
munity in March 2012 [7]. Although the two sets of sam- 
ples had different MD5 hashes, they both connected to 
the same C2 server (shared with APT.9002) on the same 
port number, and exhibited the same behavior to estab- 
lish persistency on the victim's machine. 

Third, our 13 TravNet samples exhibited similar be- 
havior as those used against Indian targets in 2013 [2]. To 
do so, we obtained the samples used in India, generated 
their behavioral profiles, and compared them manually 
with the malware in our TravNet cluster. Although both 
sets connected to different C2 servers and exhibited vari- 
ations in the way they searched the victims' file system, 
we found that they both used the same communication 
protocol with the C2. 

Fourth, samples in another cluster communicated with 
the same C2 server and exhibited the same behavior as a 
Vidgrab sample found in a malicious document sent to a 
victim in Hong Kong in August 2013 [8]. 

Summary of findings. We found that WUC has been 
targeted with several malware families in the last year. 
We also showed that the Surtr and APT.9002 clusters 
corresponded to malware that Citizenlab and FireEye 
identified as having targeted the Tibetan community, 
as well as other political and industrial organizations 
[7, 9]. Furthermore, 24% of our malware (including 
Surtr, APT.9002 and three other clusters) had at least one 
C2 domain in common, which was identical to those of 
the Citizenlab and FireEye reports. 



6 Future Work 

Several directions for future work arise from this work. 
We briefly discuss them below. 

Attack vectors and generalization. Our analysis is 
limited to attack vectors used against WUC. Similar stud- 
ies on a wider range of targets would benefit understand- 
ing this emerging threat better. Further, our attack vec- 
tors distributed over email channels and have two main 
limitations. First, it is possible that our volunteers have 
been attacked via other channels besides email. Sec- 
ond, although we have seen various organizations tar- 
geted with the same malware as WUC, it is generally 
hard to determine with certainty which victims were the 
primary target of these attacks. Therefore, it is possi- 
ble that other victims have been targeted with additional 
attack vectors when the attacks did not involve WUC. 
Further research is needed to overcome these limitations 
Exploring different channels that attackers use for dis- 
tributing malicious payloads is important. As a step to- 
wards this goal, we are currently collaborating with the 
Safebrowsing team at Google to investigate the emer- 
gent threat of watering-hole attacks. These attacks are 
conceptually very similar to drive-by download attacks 
with one key difference: They compromise very specific 
websites commonly visited by the targeted community 
(e.g., a company's website) and wait for victims to visit 
the website. As compared to spear phishing, watering- 
hole attacks offer the advantage of potentially targeting 
a fairly large number of victims (e.g., all employees of a 
large company) before raising suspicion. We conjecture 
that the small number of suspicious links in our dataset 
may be due to the small size of the targeted organizations 
and the public availability of their employees' email ad- 
dresses. 

Other attack vectors include but are not limited to 
packets injection to redirect victims to malicious servers 
(similar to those used in watering-hole attacks) and phys- 
ical attacks on the victims' devices [3J. Detecting these 
attacks would require completely different methodolo- 
gies than the one we used in this paper. 
Monitoring. We have seen that a few high-profile mem- 
bers of the Uyghur community were compromised and 
that their email accounts were being used as stepping 
stones to carry out targeted attacks. Although it is possi- 
ble that these email accounts were compromised via tar- 
geted attacks, we have not yet confirmed this hypothesis. 
More generally, we do not know yet what is the specific 
aim of these targeted attacks. Monitoring the full Ufecy- 
cle of targeted attacks would require novel measurement 
systems, deployed at the end users, that can identify com- 
promises without being detected. 

Pinpointing the geolocation of attackers carrying out 
targeted attacks, or attack attribution, is another open 
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monitoring challenge. Marczak et al. were able to at- 
tribute targeted attacks to governments in the Middle 
East by analyzing relationships of cause and effect be- 
tween compromises and real-world consequences [16]. 
In contrast to monitoring and attack attribution, this pa- 
per has presented a extensive, complementary analysis of 
the life cycle of targeted attacks before the compromise. 
Large-scale malware analysis and clustering. We 
found it challenging to (a) cluster targeted malware and 
(b) locate similar samples. First, this malware sometimes 
exhibits significant similarity in its logic and different 
malware may also use the same Command and Control 
(C2) infrastructure. As a result, traditional clustering al- 
gorithms tend not to work very well. Second, we located 
similar samples based on a limited set of indicators such 
as C2, cryptographic hash, or YARA signatures, how- 
ever, we feel that our current capabihty in that respect has 
a lot of room for improvement. We foresee that a search 
engine that can, for example, locate malware matching 
certain indicators out of an arbitrarily large corpus would 
be a useful instrument for researchers working on tar- 
geted attacks. 

Our analysis of CVEs highUghts that telemetry data 
from commercial AVs is not always reliable. Our analy- 
sis complemented with taint-analysis was largely manual 
and time-intensive. Analysis techniques to quickly diag- 
nose known CVEs directly from given exploits is an open 
problem and perhaps one of independent interest. 
Defenses. Our findings confirm that AVs may miss 
known CVEs, even years after their release dates. 
Clearly, known CVEs contribute a large part of the 
emerging threat of targeted attacks. Understanding why 
commercial AVs miss known attacks conclusively, for 
example to tradeoff false positives or performance for 
security, is an important research direction. Designing 
effective defenses against targeted attacks is a major re- 
search challenge which depends on our ability to under- 
stand the threat at hand. As part of future work, one 
could evaluate the effectiveness of novel defenses based 
on the findings from this paper As a small step towards 
that goal, we plan to soon deploy a webmail plugin that 
combines metadata and stylometry analysis [17] to detect 
contact impersonation. 

7 Conclusion 

We have presented an empirical analysis of a dataset 
capturing four years of targeted attacks against a human- 
rights NGO. First, we showed that social engineering 
was an important component of targeted attacks with 
significant effort paid in crafting emails that look legiti- 
mate in terms of topics, languages, and senders. We also 
found that victims were targeted often, over the course 
of several years, and simultaneously with colleagues 



from the same organization. Second, we found that 
maUcious documents with well-known vulnerabiUties 
were the most common attack vectors in our dataset and 
that they tended to bypass common defenses deployed in 
webmails or users' computers. Finally, we provided an 
analysis of the targeted malware and showed that over 
a quarter of samples exhibited similarities with entities 
known to be involved in targeted attacks against a variety 
of industries. We hope that this paper, together with 
the public release of our malware dataset, will facilitate 
future research on targeted attacks and, ultimately, guide 
the deployment of effective defenses against this threat. 
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A Targeted organizations 



Organization 


# Recipients 


# Enicifls 


Firsi-Lasi 


World Uyghur Congress (WUC) 


53 


2,366 


2009-2013 


East Turkestan Union in Europe (ETUE) 




153 


2010-2013 


Ausii;iliLi[i IJ; ^liui' .Association 




129 


2009-20 1 3 


Euro-AsiLi FiHiiidaiian m Turkey 


- 


101 


20 1 0-20 1 3 


Uyghur Ciinadian AsMiciation 




9K 


2009-20 1 3 


Gerjiiany Uygliur Women Commiftee 




82 


2009-2013 


Radio Free Asia (RFA) 


12 


80 


2010-2013 


France Uyghur Association 


5 


80 


2009-2013 


Eastern "nirkestan Australian Association (ETAA) 




77 


2009-2013 


Uyghur American Associalion (UAA) 




72 


2010-2013 


Eastern Turkestan Uyghur Association in Netherlands 






2010-2013 


Netheiland Uyghur Union 




60 


2012-2013 


United Nations for a Free Tibet (UK) 




57 


2011-2013 


Eastern Turkestan Culture and Solidarity Associalion 


13 


53 


2009-2013 


Wktoria Uyghur Association 




48 


2010-2013 


Japan Uyghur Association 






2012-2013 


Switzerland East 'Hirkestan Association 




AT 


2010-2013 


Hacettepe University Turkey 




A1 


20 1 0-20 1 3 


Kazakhstan Academy of Poetry 




V. 


2009-2013 


Belgium Uyghur Association 




35 


2009-2013 


Kyrgy/.sian Uygluir Association 




33 


201 1-2013 


UyghLir (. aiiLiJiLiii Sulk-in 




31 


2009-20 1 3 


UyghLir .X.aJL-mv 




25 


2009-20 1 3 


MuniL-]i I Ngliiii IJlL-is Mcshrcp 




22 


2012-2013 


Republican l"\ l:]iui C iiliiiral Center of Kazakhstan 




22 


2009-2013 


Sweden I Ngliui As-iviation 






2010-2013 


Virgiiua DepiLi'iiiiciit nl Social Services 






2009-20 12 


Unrepresented Naiuni' and IVoples Organization (UNPO) 






20 1 0-20 1 3 


Sociale VerzekcriiiL:sl)aiik (SN'Bi NGO Netherland 






2012-2013 


China Democratic Party (COP) 






2009-201 1 


Finland Uyghur Association 






2013 


Jet Propulsion Laboratory, founded by NASA 




5 


2012 


Pennsylvania State University US 




5 


2010-2013 


Uyghur Support Group Nederland 






2010-2013 


Norway Uyghur Committee 






2010-2013 


Amnesty International 






2010-2012 


Association of European Border Regions (AEBR) 






2010-201 1 


Howard University US 






2012-2013 


Initiatives for China 






2009-20 10 


LSE Asia Research Center and Silk Road Dialogue 






201 2-20 13 


The Government- in-Exile of the Republic of East Ttakistan 


- 




20 1 0-20 1 1 


Uyghur Human Rights Project (UHRP) 


- 




2010 


Australian Migration Options Pty Ltd 






2010 


Agence France-Presse 






2013 


National Endowment for Democracy (NED) 


- 




2010-2012 


PEN International 




3 


2009-2013 


Syracu^c Uni\cr-ii\ I'S 






2013 


Worldwide Pi\i(i'--i in 1 luiii^: aijJ Si:|i|Kiri ol L'yghiirs Dying for Freedom 






2013 


Australian Govcniniciii - Dcpaiiinenl of Foreign Affairs and Trade 




3 


2010 


New Tang Dynasty Television China 






2010 


The Epoch Times 




3 


2010 


Ministry of Foreign Affairs Norway 






2013 


International University of Kagoshima Japan 






2013 


Association of Islam Rehgion 




2 


2013 


Bilkent University TUricey 






2011-2012 


Embassy of Azerbaijan in Beijing 






2010 


Indiana University School of Law-IndianapoUs LL.M. 






2012 


KYOCERA Document Solutions Development America 






2013 


New York Times 






2009 


Pfizer Government Research Laboratory - Clinical Hiarmacology 






201 1-2012 


Saudi Arabia - Luggage Bags and Cases Company 






201 ^ 


Students foi' a Free Tibet 






2010 


Sweden L vglinr Etiiicatioii Union 






2010-2013 


Uyghur International Culture Center 






2012 


The Protestant Church .Amsterdam 






2010 


Swiss Agency foi' Development and Cooperation (SDC) Kargyzstan 








American Bai' Association foi' Attorneys in US 






2010 


Assistance for Work Germany Frankfurt 






2010 


Bishkek Human Rights Committee 








Central Tibetan Administration (CTA) 








Chinese Translation CtHmnercial Business 








Circassian Cultural Center {CHKTS) 






•7 mo 


Colombian National Radio 








Embassy of the United States in Australia 








Europa Haber Newsp^ter TUrk^ 






on 
2010 


Europe-China Cultural Commumcation (ECCC) 






2011 


Freelance Reporter and writer Turkey 






2012 


Goethe University Frankfurt am Main Germany 






2012 


Human Rights Campaign in China 






2010 


International Enterprise (IE) - Singapore Government 






2010 


International Tibet Independence Movement 






2010 


Jasmine Revolution China (Pro-Democracy Protests) 






2009 


Socialist Parly (Netherlands) 






2011 


Los Angeles Times 






2010 


Milli Gazete (National Newspaper Itakey) 






2010 


Norv-'egian Tibet Committee 






2010 


Photographer Turkey 






2012 


CNN International Hong Kong 






2012 


Reporters Wthout Borders 






2012 


Republican National Lawyers Association Maryland 








Save Tibet - International Campaign for Hbet 






2010 


Society for Threatened People (STPI) 






2012 


Southern Mongolian Human Rights 






2012 


Stucco Manufacturers Association US 






2013 


Supericff School of Arts France 






2012 


The George Washington University 






2013 


TurkishNews Newspaper 






2010 


US Bureau of Transportation Statistics 






2009 


Umit Uighur Language School 






2010 


Union of Turkish-Islamic Cultural Associations in Europe 






2012 


University of Adelaide Melbourne 






2010 


University of Khartoum Sudan 






2012 


US Embassv and Con^lllatc in Munich Germany 






2011 


Wei JiiiLj-heiiii Foundation 






2009 


Xinjiaii.; Arts Iiisiiime Cliina 






2010 


Yenicag Gazetti (Newspaper Turkey) 






2010 


American University 






2012 


Islamic Jihad Union 






2012 
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B Dynamic taint-assisted analysis of mali- 
cious documents 
B.l Methodology 

We use BitBlaze [21] to perform dynamic taint-tracking analy- 
sis of the targeted applications under the malicious documents 
as input and configure it to report four kinds of reports: (a) 
when a tainted Extended Instruction Pointer (EIP) is executed, 
(b) when a memory fault is triggered in the target program, (c) 
when a new process is spawned from the target program, or 
(d) when the analysis "times out" (i.e., runs without interrup- 
tion for over 15 minutes). To mark malicious documents as a 
source of taint, we tainted the network inputs and routed the 
malicious input file using netcat. Additionally, we set Bit- 
Blaze to exit tracing at the detection of null pointer exceptions, 
user exceptions, tainted EIPs, and process exits before the start 
of the trace. A tag was generated from the trace by obtain- 
ing the last instruction with tainted operands, and matching it 
with the list of loaded modules generated by TEMU. Our guest 
(analysis) system configuration used in the image consists of 
clean installations of Windows XP SP2 with TEMU drivers and 
Microsoft Office 2003. 

B.2 Results 

Anti-virus software typically uses static signature-matching or 
whitelisting techniques to analyze malware. To validate the 
analysis results available from commercial AV, we ran a sep- 
arate semi-automated dynamic analysis of the targeted applica- 
tion under our malicious documents. 

Out of 817 unique input documents (725 malicious and 92 
legitimate), 295 timed out with our BitBlaze analysis without 
reporting a tainted EIP, a memory fault, or a newly spawned 
process.^ Another 13 of them were incompatible with our anal- 
ysis infrastructure (using a more recent DOCX format). We 
could not compare these cases directly to the results obtained 
from VirusTotal. Therefore, we focus on the remaining 509 
malicious documents in the evaluation. 
Efficacy of Taint EIP Detection. Taint-tracking detected 
tainted EIP execution in 477 out of the 509 documents. In 
19 cases of the undetected 32 cases, however, a new process 
was spawned without it being detected by taint-tracking. We 
treat these as false negatives in taint-tracking. We speculate 
that this is likely to be due to missed direct flows, untracked 
indirect flows (via control dependencies, or table-lookups), or 
attacks using non-control-flow hijacking attacks (such as argu- 
ment corruption). 13 documents did not lead to a tainted EIP 
execution, but instead caused a memory fault. This could be 
due to a difference in our test infrastructure and the victim's, 
or an attempt to evade analysis. In 33 of the 477 cases where 
tainted EIP was detected, no new spawned process was created, 
and the tainted EIP instruction did not correspond to any shell- 
code. All these cases correspond to a particular instruction trig- 
gering the tainted EIP detection in MSO.DLL, a dynamic-link 
library found in Microsoft Office installations. To understand 
this case better, we manually created blank benign documents 
and fed them to Microsoft Office — they too triggered tainted 
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] Unknown 
] Heuristic 
] Multiple 
] Single 



Tainted Timeout Spawned Fault DOCX 

Figure 12: Breakdown of dynamic taint-assisted analy- 
sis, and comparison to VirusTotal AV results. Single, 
Multiple, Heuristics, and Unknown correspond to the dif- 
ferent AV tags assigned to documents. The main bars 
show the detection result from BitBlaze: (a) Detected 
by Tainted EIP execution, (b) Timeout, (c) Spawned pro- 
cess without tainted EIP execution, (d) Memory Fault 
without tainted EIP execution, and (e) DOCX unable to 
run in our analysis environment. Within each main bar, 
each stacked bar represents the corresponding tag given 
by VirusTotal. 

EIP detections. We treat these cases as false positives in taint 
detection, possibly because of benign dynamic generation of 
code. All the remaining cases (i.e., 444 out of 477) are legiti- 
mate exploits that we could confirm to execute shellcode. 
Dynamic Taint versus VirusTotal. Figure 12 shows the de- 
tailed comparison of taint-assisted classification of vulnera- 
bilities versus the results from VirusTotal. Out of a total of 
477 documents on which tainted EIP was detected, VirusTotal 
tagged 397 documents with one or more CVEs. Of the remain- 
ing 80 cases that are detected by tainted EIP execution, 24 are 
undetected by VirusTotal, and 56 are detected, but marked Un- 
known (i.e., no CVE assigned) by VirusTotal. Dynamic taint 
analysis to determine the tainted EIP was helpful to further 
refine the results of AV detection for a majority of these 56 
tagged-Unknown cases. Specifically, for 55 out of the 56 doc- 
uments, taint-assisted manual analysis was able to resolve it to 
the exploited CVE. 

Out of a total of 477 documents on which tainted EIP was 
detected, VirusTotal tagged 397 documents with one or more 
CVEs. Our taint-assisted manual analysis agrees with the 
VirusTotal CVE tag results on 372 of these 397. That is, 372 
documents were detected to execute a tainted EIP for which we 
could manually correlate to a single CVE that was the same as 
the one reported by a majority of the AVs in VirusTotal.^ Thus, 
for a large majority of the cases, taint-assisted analysis agrees 
with the AV results. Of the remaining 25 cases, 17 could be 
identified as misclassifications because the CVE reported by 
most of the AVs in VirusTotal was not the one that affected 
the program. The 8 remaining documents were tagged by taint 
analysis as being false positives even though a CVE was ob- 
tained from VirusTotal. 



We believe 1 50 of these are due to user-interaction which we could 
not presently automate, and the remaining could potentially be ana- 
lyzed with a faster test platform; we plan to investigate this in the fu- 
ture. 



Note that different AVs often tag the same vulnerability with dif- 
ferent tags in VirusTotal. We took the tag given by a majority of the 
reported tags, as the representative of the sample. 
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C Command and Control (C2) servers 



C2 # Emails C2 # Emails C2 # Emails C2 # Emails 



61.178.77.169 74 


mzyzy.vicp.net 3 


www.info-microsoft.com 2 


googlehk.dynamicdns.co.uk 1 


dtl.dnsd.me 66 


mygoodbug.dnsd.info 3 


www.uyhanur.ima.cc 2 


113.10.201.254 1 


ns.dns3-domain.com 55 


www.uyghuri.mrface.com 3 


www.micosofts.com 2 


152.101.38.177 1 


dtl.eatuo.com 44 


6. test.3322.org.cn 3 


100.4.43.2 2 


blog.sina.com.cn 1 


202.85.136.181 32 


218.82.206.229 3 


61.234.4.214 1 


uyghur.epac.to 1 


update.googmaiI.org 31 


uyghursov.tw 3 


a.yahoohello.com 1 


xinxin20080628.gicp.net 1 


dtl6.mooo.com 29 


3.test.3322.org 3 


bcl516.7766.org 1 


yahOOmail.gicp.net 1 


www.discoverypeace.org 26 


newwhitehouse.org 3 


202.68.226.250 1 


hbnjx.6600.org 1 


58.64.172.177 22 


goodnewspaper.f3322.org 3 


msdn.homelinux.org 1 


humanbeing2009.gicp.net 1 


email.googmail.org 22 


nskupdate.com 3 


207.204.245.192 1 


webhelp01.freetcp.com 1 


news.googmail.org 22 


webmonder.gicp.net 3 


216.131.66.96 1 


mobile.yourtrap.com 1 


61.128.122.147 17 


61.132.74.68 3 


www.avasters.com 1 


125.141.149.23 1 


softmy.jkub.com 15 


61.178.77.108 3 


202.130.112.231 1 


222.73.27.223 1 


61.234.4.213 13 


betterpeony.com 3 


nbsstt.3322.org 1 


www.jiapin.org 1 


dnsmm.bpa.nu 11 


4.test.3322.org 3 


goodnewspaper.3322.org 1 


ibmcorp.slyip.com 1 


121.170.178.221 10 


61.234.4.210 3 


webposter.gicp.net 1 


182.16.11.187 1 


zeropan007.3322.org 10 


9.test.3322.org.cn 3 


uyghurl.webhop.net 1 


star2.ksksz.com 1 


wwzzsh.3322.org 9 


8.tcst.3322.org.cn 3 


webwx.3322.orgxiexie.8866.org 1 


69.197.132.130 1 


222.77.70.237 9 


1. test.3322.org 3 


125.141.149.49 1 


www.yahooprotect.com 1 


3.test.3322.org.cn 8 


radio.googmail.org 3 


guanshan.3322.org 1 


xiexie.8866.org 1 


l.test.3322.org.cn 8 


7.test.3322.org.cn 3 


leelee.dnset.com 1 


img.mic-road.com 1 


2.test.3322.org.cn 8 


tokyo.collegememory.com 2 


uygur.eicp.net 1 


photo.googmail.org 1 


eemete.freetcp.com 8 


201.22.184.42 2 


kxwss.8800.org 1 


tonylee38.gicp.net 1 


applel2.crabdance.com 8 


61.178.77.96 2 


173.208.157.186 1 


suggest.dnsl.us 1 


wolf001.usl09.eoidc.net 7 


webproxy.serveuser.com 2 


rc.arkinixik.com 1 


worldview.instanthq.com 1 


4.test.3322.org.cn 7 


www.bbcnewes.net 2 


www.uusuanru.ima.ee 1 


goodnewspaper.gicp.net 1 


etdt.cable.nu 6 


done.youtubesitegroup.com 2 


uxz.fo.mooo.com 1 


112.121.182.150 1 


205.209.159.162 6 


alma.apple.cloudns.org 2 


uygur.51vip.biz 1 


abc69696969.vicp.net 1 


br.stat-dns.com 6 


webmailsvr.com 2 


peopleunion.gicp.net 1 


put.adultdns.net 1 


66.79.188.23 6 


polat.googmail.org 2 


freelOOO.gnway.net 1 


loadbook.stranglcd.net 1 


www.southstock.net 6 


religion.xicp.net 2 


uxz.fo.dnsd.info 1 


internet.3-a.net 1 


nsl.3322.net 5 


connectsexy.dns-dns.com 2 


wodebeizill9.jkub.com 1 


news.scvhosts.com 1 


121.254.173.57 5 


dns3.westcowboy.com 2 


itsec.eicp.net 1 


98.126.20.221 1 


www.uyghiu'.25u.com 5 


61.220.138.100 2 


stormgo.oicp.net 1 


mydeyuming.cable.nu 1 


202.96.128.166 5 


27.254.41.7 2 


boy303.2288.org 1 


gshjl.3322.org 1 


nsl.oray.net 5 


116.92.6.197 2 


webjz.9966.org 1 


foreverO01.dtdns.net 1 


jhska.cable.nu 5 


applel2.co.cc 2 


zbing.strangled.net 1 


grtl.25u.com 1 


testl95.3322.org 5 


58.64.129.149 2 


tommark5454.xxxy.info 1 


66.197.202.242 1 


61.234.4.218 5 


worldmaprsh.com 2 


oyghurl.wcbhop.net 1 


kaba.wikaba.com 1 


61.128.110.37 5 


phinexl27.gicp.net 2 


addi.apple.cloudns.org 1 


221.239.96.180 1 


nsl.china.com 5 


wxjz.6600.org 2 


60.170.255.85 1 


174.139.133.58 1 


a2010226.gicp.net 5 


gecko.jkub.com 2 


toolsbar.dns0755.net 1 


125.141.149.46 1 


logonin.uyghuri.com 4 


smtp.126.com 2 


61.132.74.113 1 


frank.3feet.com 1 


macaoncws.8800.org 4 


errorslog.com 2 


113.10.201.250 1 


115.126.3.214 1 


book.websurprisemail.com 4 


uyghurie.51vip.biz 2 


home.graffiti.net 1 


liveservices.dyndns.tv 1 


desk.websurprisemail.com 4 


tanmii.gicp.net 2 


statistics.netrobots.org 1 


inc.3feet.com 1 


test.3322.org.cn 4 


211.115.207.7 2 


freesky365.gnway.net 1 


lnsmm.bpa.nu 1 


221.239.82.21 4 


59.188.5.19 2 


greta.ikwb.com 1 


www.yahooprotect.net 1 


liveservices.dyndns.info 4 


206.196.106.85 2 


englishclub.2288.org 1 


222.82.220.118 1 


180.169.28.58 4 


religion.8866.org 2 


mm.utf888.com 1 


webwxjz.3322.org 1 


portright.org 4 


68.89.135.192 2 


annchan.mrface.com 1 


61.234.4.220 1 


video.googmail.org 4 


blogging.blogsite.org 2 


www.shine.4pu.com 1 


thankyou09.gicp.net 1 


www.guzhijiaozihaha.net 4 


softjohn.ddns.us 2 


copy.apple.cloudns.org 1 


218.28.72.138 1 


207.46.11.22 4 


report.dns-dns.com 2 


220.171.107.138 1 


soft.epac.to 1 


www.googmail.org 4 


115.160.188.245 2 


uyghuri.mrface.com 1 


www.yahooip.net 1 


2.test.3322.org 3 


newyorkonlin.com 2 


218.108.42.59 1 


msejake.7766.org 1 


dcp.googmail.org 3 


tw252.gicp.net 2 


58.64.193.228 1 


202.67.215.143 1 


tcst.3322.org 3 


61.222.31.54 2 


tt9c.2288.org 1 


www.yahoohello.com 1 


np6.dnsrd.com 3 


tomsoimiartin.ikwb.com 2 


forum.universityexp.com 1 


202.109.121.138 1 
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